51 research outputs found

    Validity and efficiency of conformal anomaly detection on big distributed data

    Get PDF
    Conformal Prediction is a recently developed framework for reliable confident predictions. In this work we discuss its possible application to big data coming from different, possibly heterogeneous data sources. On example of anomaly detection problem, we study the question of saving validity of Conformal Prediction in this case. We show that the straight forward averaging approach is invalid, while its easy alternative of maximizing is not very efficient because of its conservativeness. We propose the third compromised approach that is valid, but much less conservative. It is supported by both theoretical justification and experimental results in the area of energy engineering

    Distributed Conformal Anomaly Detection

    Get PDF

    On-line predictive linear regression

    Full text link
    We consider the on-line predictive version of the standard problem of linear regression; the goal is to predict each consecutive response given the corresponding explanatory variables and all the previous observations. We are mainly interested in prediction intervals rather than point predictions. The standard treatment of prediction intervals in linear regression analysis has two drawbacks: (1) the classical prediction intervals guarantee that the probability of error is equal to the nominal significance level epsilon, but this property per se does not imply that the long-run frequency of error is close to epsilon; (2) it is not suitable for prediction of complex systems as it assumes that the number of observations exceeds the number of parameters. We state a general result showing that in the on-line protocol the frequency of error for the classical prediction intervals does equal the nominal significance level, up to statistical fluctuations. We also describe alternative regression models in which informative prediction intervals can be found before the number of observations exceeds the number of parameters. One of these models, which only assumes that the observations are independent and identically distributed, is popular in machine learning but greatly underused in the statistical theory of regression.Comment: 34 pages; 6 figures; 1 table. arXiv admin note: substantial text overlap with arXiv:0906.312

    Conformal Changepoint Detection in Continuous Model Situations

    Get PDF

    Multi-level conformal clustering:A distribution-free technique for clustering and anomaly detection

    Get PDF
    In this work we present a clustering technique called multi-level conformal clustering (MLCC). The technique is hierarchical in nature because it can be performed at multiple significance levels which yields greater insight into the data than performing it at just one level. We describe the theoretical underpinnings of MLCC, compare and contrast it with the hierarchical clustering algorithm, and then apply it to real world datasets to assess its performance. There are several advantages to using MLCC over more classical clustering techniques: Once a significance level has been set, MLCC is able to automatically select the number of clusters. Furthermore, thanks to the conformal prediction framework the resulting clustering model has a clear statistical meaning without any assumptions about the distribution of the data. This statistical robustness also allows us to perform clustering and anomaly detection simultaneously. Moreover, due to the flexibility of the conformal prediction framework, our algorithm can be used on top of many other machine learning algorithms

    Testing and Clustering with Gauss Linear Assumption for a Household Data

    Get PDF
    • …
    corecore